Obtaining Predictions from Models Fit to Multiply Imputed Data
نویسنده
چکیده
Obtaining predictions from regression models fit to multiply imputed data can be challenging because treatments of multiple imputation seldom give clear guidance on how predictions can be calculated, and because available software often does not have built-in routines for performing the necessary calculations. This research note reviews how predictions can be obtained using Rubin’s rules, that is, by being estimated separately in each imputed data set and then combined. It then demonstrates that predictions can also be calculated directly from the final analysis model. Both approaches yield identical results when predictions rely solely on linear transformations of the coefficients and calculate standard errors using the delta method and diverge only slightly when using nonlinear transformations. However, calculation from the final model is faster, easier to implement, and generates predictions with a clearer relationship to model coefficients. These principles are illustrated using data from the General Social Survey and with a simulation.
منابع مشابه
The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data
Multiple imputation can be used as a tool in the process of constructing prediction models in medical and epidemiological studies with missing covariate values. Such models can be used to make predictions for model performance assessment, but the task is made more complicated by the multiple imputation structure. We summarize various predictions constructed from covariates, including multiply i...
متن کاملValidation of prediction models based on lasso regression with multiply imputed data
BACKGROUND In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some o...
متن کاملMultiple imputation and other resampling schemes for imputing missing observations
The problem of imputing missing observations under the linear regression model is considered. It is assumed that observations are missing at random and all the observations on the auxiliary or independent variables are available. Estimates of the regression parameters based on singly and multiply imputed values are given. Jackknife as well as bootstrap estimates of the variance of the singly im...
متن کاملاهمیت خویشاوندی ژنتیکی و رکورد فنوتیپی بر صحت ژنومی دادههای جانهی شبیه سازی شده با استفاده از مدل های حیوانی در حضور اثرات متقابل ژنوتیپ و محیط
The objective of this study was to investigate the role of genetic relationships between training and validation set with considering different ratio of phenotypic records of training set on accuracy of genomic prediction via animal models containing genotype × environment interactions in simulated imputation data. For this purpose, four different scenarios using 15k density containing differen...
متن کاملGoodness-of-fit Tests for Archimedean Copula Models
In this paper, we propose two tests for parametric models belonging to the Archimedean copula family, one for uncensored bivariate data and the other one for right-censored bivariate data. Our test procedures are based on the Fisher transform of the correlation coefficient of a bivariate (U, V ), which is a one-toone transform of the original random pair (T1, T2) that can be modeled by an Archi...
متن کامل